A FAST ALGORITHM FOR TOTAL VARIATION IMAGE RECONSTRUCTION 

FROM RANDOM PROJECTIONS 
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Abstract. Total variation (TV) regularization is popular in image restoration and reconstruction due to its ability to 
preserve image edges. To date, most research activities on TV models concentrate on image restoration from blurry and 
noisy observations, while discussions on image reconstruction from random projections are relatively fewer. In this paper, 
we propose, analyze, and test a fast alternating minimization algorithm for image reconstruction from random projections 
via solving a TV regularized least-squares problem. The per-iteration cost of the proposed algorithm involves a linear time 
shrinkage operation, two matrix-vector multiplications and two fast Fourier transforms. Convergence, certain finite convergence 
and g-linear convergence results are established, which indicate that the asymptotic convergence speed of the proposed algorithm 
depends on the spectral radii of certain submatrix. Moreover, to speed up convergence and enhance robustness, we suggest 
an accelerated scheme based on an inexact alternating direction method. We present experimental results to compare with 
an existing algorithm, which indicate that the proposed algorithm is stable, efficient and competitive with TwIST [3] — a 
state-of-the art algorithm for solving TV regularization problems. 
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1. Introduction. Image restoration and reconstruction play important roles in medical and astronom- 

2 

ical imaging, image and video coding, file restoration, and many other applications. Let u G W 1 be an 

2 

original n x n image, A G ]R mxn be a linear operator, and / G W 71 be an observation which satisfies the 
relationship 

f = N(Au) GM m , (1.1) 

where N(-) represents a noise contamination or corruption procedure. Given A, image restoration and 
reconstruction extract u from /, which is either under-determined (m < n 2 ) or ill-possed (e.g., deconvo- 
lution/deblurring), making classical least-squares approximation alone not suitable. To stabilize recovery, 
regularization technique is frequently used, giving a general reconstruction model of the form 

min$ reg (?i) + fjL$& d (Au - /), (1.2) 

u 

where <& Tez (u) promotes solution regularity such as smoothness and sparseness, ^^{Au — f) fits the observed 
data by penalizing the difference between Au and /, and \i > balances the two terms for minimization. 
The choice of 3>fid(-) depends on different noise, e.g., the squared £2 penalty is usually used for Gaussian 
additive noise, while the t\ penalty is more appropriate for certain non-Gaussian noise, e.g., salt-and-pepper 
noise. Throughout this paper, we assume that N(-) represents an additive Gaussian noise contamination 
and thus set 3>fid(-) = || ■ Hi- Among other regularization, total variation (TV) has been popular ever since 
its introduction by Rubin, Osher and Fatemi [27] . The remarkable property of TV is to preserve edges due 
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to its linear penalty on differences between adjacent pixels. The most widely studied TV model for image 
deconvolution (in which case A G ]R n2xn2 is a convolution matrix) is 

mm^J|A W ||2 + fP«-/|||, (1.3) 

where Di G R 2xn2 denotes the local finite difference operator (with ceratin boundary conditions) at pixel 
z, and \\Diu\\2 is a discretization of the TV of u. In this paper, we propose, analyze and test a fast 
alternating minimization algorithm for solving (jl.3p . in which A is a compressive sensing encoding matrix 
(m < n 2 ) and does not have structures. 

In the following of this section, we review briefly compressive sensing ideas and algorithms, which provide 
theoretical guarantee for image reconstruction via solving (|1.3|) , examine some existing algorithms for relevant 
TV problems, and describe the contributions and organization of this paper. Throughout this paper, we 
refer to (JOJ) as TV/L 2 . 

1.1. Compressive sensing — ideas and algorithms. Compressive sensing (CS) is an emerging 
methodology in digital signal processing brought to the research forefront by Donoho [11!, Candes, Romberg 
and Tao [6j[7], and has attracted intensive research activities in the past few years. In a nutshell, CS first 
encodes a sparse signal (possibly under certain sparsifying basis) through hardware devices into a relatively 
small number of linear projections and then reconstructs it from the limited measurements. Let x G IR n 
be the sparse signal that we wish to capture, i.e., the number of nonzeros in x is much less than its length 
n, and b = Ax represent a set of m (usually much smaller than n) linear projections of x. Under certain 
desirable conditions, it is shown that with high probability the basis pursuit problem 

min||x||i s.t. Ax = fr, (1-4) 

X 

yields the sparsest solution of the linear system Ax = 6, see [12J. More often than not, b contains noise, in 
which case certain relaxation is desirable. For white Gaussian noise, the most widely used models are the 
basis pursuit denoising problem 

min\\x\\ 1 + ^\\Ax-b\\l (1.5) 

X Z 

where, roughly speaking, \i > is inversely proportional to the noise level, and its variants. Since most 
signals of interests are sparse or nearly sparse (called compressible) under certain basis, the CS idea has 
extremely wide applications. Recent results show that stable reconstruction can be obtained provided that 
A possesses certain randomness. It has been clear from [41] that for almost all random matrices the exact 
recoverability is approximatively identical. Moreover, exact recoverability is attainable when A contains 
randomly taken rows from orthonormal matrices, e.g., partial Fourier which arises from magnetic resonance 
imaging [22J. 

In the application of CS, matrix A is large and dense. Furthermore, in certain applications A contains 
structures that allow fast matrix- vector multiplication, e.g., A is a partial Fourier matrix as in MRI. These 
features make traditional powerful optimization approaches such as interior point methods not suitable. In 
comparison, first-order algorithms that depend on merely matrix- vector multiplications are more desirable. 
Therefore, in the last few years numerous algorithms have been proposed for recovering sparse signals via 
solving certain ^i-norm regularized problems including (|1.4|) . ([1.5)) and thier variants. Several well-known 
approaches in this area include the gradient projection method [15], the fixed-point continuation method 
[19], the spectral projected gradient method [30], and the Bregman iterative method [231 SQl [23 SI IS] • More 
recent algorithms can be found in [2j [8l [35j [37] . 
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1.2. Some existing algorithms for TV/L 2 . The advantage of TV regularization compared with 
Tikhonov-like [29] regularization in recovering high quality image is not without a price. The nondifferentia- 
bility of TV causes the main difficulty. In addition, problems arising from signal and image reconstruction 
are usually large scale and ill-possed, which further make TV models difficult to be solved efficiently. Since 
the introduction of TV regularization, many algorithms have been proposed for solving (jl.3j) and its variants. 
In the pioneer work [27], a time-marching scheme was used to solve a partial differential equation system, 
which in optimization point of view is equivalent to a constant step-length gradient descent method. This 
time-marching scheme suffers slow convergence especially when the iterate point approaches the solution set. 
Another well-known method is the linearized gradient method proposed in [31] for denoising and in [32] for 
deblurring, which solves the Euler-Lagrangian equation via a fixed-point iteration. At each iteration of the 
linearized gradient method, a linear system needs to be solved, which makes the per-iteration cost extremely 
expensive especially when the problem becomes more ill-conditioned. To overcome the linear convergence of 
first-order methods, the authors of [31] incorporated Newton method to solve (jl.3p . which achieved superlin- 
ear convergence at the cost of solving a large linear system at each iteration. Another important approach 
for TV problems is the iterative shrinkage/thresholding (1ST) method [13] [14] [28] . In [3], Bioucas-Dias and 
Figueiredo introduced a two-step 1ST (TwIST) algorithm, which exhibits much faster convergence than the 
primary 1ST algorithm for ill-conditioned problems. We note that IST-based algorithms require to solve a 
TV denoising subproblem at each iteration which requires its own iterations. 

Despite the progress have been achieved, algorithms for solving (jl.3|) are still much slower than those for 
Tikhonov regularization problems. Recently, a fast TV deconvolution (FTVd) method is proposed in [33] . 
which makes full use of problem structures (both A and finite difference operators have circulant structures 
under proper boundary conditions) and thus converges very fast. FTVd solves a penalty approximation of 
(fL3|) . that is 



where, for each z, G R 2 is an auxiliary variable and j3 > is a penalty parameter. The advantage of 
considering ([1.6)) is that it leads to fast and efficient alternating minimizations for deconvolution problems. 
The numerical results given in [33] indicates that FTVd is much faster than the lagged diffusivity method in 
[31], which is known to be efficient previously. For more details on the FTVd algorithm and its performance, 
see [33]. Given the practical efficiency of FTVd, this split and penalty idea has been extended to multichannel 
image restoration in [36] , impulsive noise elimination in [38] and medical reconstruction from partial Fourier 
coefficients in [39 . More algorithms for TV/L 2 problem can be found in [9] [10] [2]] [23] [26] [3l] an d references 
therein. 

1.3. Contributions. The purpose of this paper is to develop a fast algorithm for solving (|1.3)h where 
A is a general linear operator. Specifically, we are interested in compressive sensing encoding matrices in 
which case A contains smaller or even much smaller number of rows than columns. As is stated above, 
problem (|1.6|) admits fast alternating minimization when A is a convolution matrix. As a matter of fact, the 
minimization of (|1.6|) with respect to w^, i = 1, 2, . . . , n 2 , reduces to n 2 two-dimensional problems (no matter 
what A is), which can be solved easily and exactly in linear time. However, different from deconvolution 
problems, A does not have structures in our stated case. Consequently, the solution of ix-subproblems can 
not utilize any fast transforms. 

In this paper, we first introduce a fast alternating minimization scheme for solving (|1.6|) , which recurs 
to linearization and proximal techniques when solving the ix-subproblems. Under quite reasonable technical 




(1.6) 
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assumptions, we show that the proposed algorithm converges globally to a solution of (|1.6|) . Moreover, we 
establish (/-linear convergence results which indicate that the (/-linear factor depends on the spectral radius 
of certain submatrix. Clearly, the solution of (|1.6|) well approximates that of (|1.3p only when f3 is sufficiently 
large, which causes numerical difficulties in computation. To overcome this drawback, we introduce an 
inexact alternating direction method, which accelerates the convergence of the alternating minimization 
approach and converges to a solution of (jl.3p without driving f3 to infinity. Since the proposed algorithms 
solve ([1.6)) and (jl.3p with a CS encoding matrix, we name the resulting algorithms FTVCS. We present 
experimental results and compare with TwIST [3|. The comparison results indicate that FTVCS is fast and 
efficient and performs comparable with the state-of-the art algorithm TwIST. 

1.4. Notation and organization. Now, we define our notation. For scalars c^, vectors and 
matrices Mi of appropriate sizes, i = 1, 2, we let a = (c^ija^) — (<^i,«2) T , v = (^15^2) — ( v ii v J) T ^ 
and M = (Mi;M 2 ) = (M 1 T ,M 2 T ) T . Let L> (1) and L> (2) be the two first-order finite difference matrices in 
horizontal and vertical directions, respectively. As is used before, Di G l 2xn2 is a two-row matrix formed 
by stacking the ith row of on that of D^ 2 \ Throughout this paper, we let D = (DM;D&) G M 2n2xn2 , 
p(T) be the spectral radius of matrix T, and V(-) be the projection operator under Euclidean norm. The 

inner product of two vectors will be denoted by (u,v). In the rest of this paper, we let || • || = || • H2, and 

2 

without misleading we abbreviate Y17=i as Additional notation will be introduced when it occurs. 

The paper is organized as follows. In Section 12. 1[ we introduce our alternating minimization algorithm 
FTVCS and study its convergence properties. An accelerated scheme of FTVCS is proposed in Section [2^21 
by incorporating an inexact alternating direction technique. Numerical results in comparison with TwIST 
are presented in Section 03 Finally, we conclude the paper in Section HI 

2. Proposed algorithms. The task of this section is to construct our algorithm for solving (|1.3p . As is 

2 

stated above, our interest in this paper concentrates on CS encoding matrices, i.e., A G R mxn with m <C n. 
The non-smoothness of TV causes the main difficulty. Similar as in [33] , we first consider the approximation 
problem (|1.6|) and then propose an inexact alternating direction method for the solution of (|1.3p . 

2.1. Alternating minimization. The introduction of auxiliary variables w in (|1.6|) makes it easy to 
apply alternating minimization. It is easy to see that, for fixed u, the minimization of (|1.6|) with respect to 
w reduces to the following two-dimensional problems 



where the convention • (0/0) = is followed. On the other hand, for fixed w, the minimization of ([1.6)) 
with respect to u is a least squares problem, and the corresponding normal equations are given by 



min || || + - ||wi - A^|| 2 , i = 1, 2, . . . , n 2 , 



(2.1) 



for which the unique minimizers are given by the two-dimensional shrinkage formula 




(2.2) 




or equivalently, 




(2.3) 
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where w G M 2n is an reordering of w^, i = 1, 2, . . . , n 2 . It is well-known that, under the periodic boundary 
condition for u, D T D is a block-circulant matrix and can be diagonalized by two-dimensional fast Fourier 
transform (FFT). Unfortunately, the matrix A T A does not have circulant structures for general CS encoding 
matrices. Therefore, the exact solution of (j2.3p is expensive, which causes the main difficulty to apply 
alternating minimization directly. 

To avoid solution of linear system of equations at each iteration, we linearize — f\\ 2 at the current 

point u k and add a proximal term, resulting the following approximation problem 

™?E< (" w *ii + f i' w * - DiU \\ 2 ) + <" - yfc ) + h l]u ~ uk]l2 ) ' (2 - 4) 

where = A T \Au k — f) denotes the gradient of — f\\ 2 at u k , and r > is a parameter. Clearly, 

problem (|2.4p is equivalent to 

™E f (m + f " w * - DiU H 2 ) + 1> - " T5fe)l12 - (2 - 5) 

For fixed iu (or w), the minimization of (|2.5p with respect to iz is equivalent to 

^D T D+^-I^u = D T w + ^(u k -Tg k ), (2.6) 

where we recall that D = (-D^; D^). Under the periodic boundary conditions for u, the coefficient matrix 
in (|2.6p can be diagonalized easily by FFT. Consequently, the solution of (|2.6p can be accomplished by two 
FFTs (including one inverse FFT). To sum up, our alternating minimization algorithm, named fast total 
variation decoding from compressive sensing measurements or FTVCS, is described below. 
Algorithm 1 (FTVCS). Input f, A and /i,/3,r > 0. Initialize u° = f and k = 0. 
While "not converged" , Do 

1) Compute w k+1 according to \2. ty) for fixed u = u k . 

2) Compute u k+1 according to \2. 6\) for fixed w = w kJrl . 

3) fc = ]fe + l. 
End Do 

To establish the convergence of FTVCS, we need the following technical assumption. 

Assumption 1. Af(A) nJ\f(D) = {0} ; where Af(-) represents the null space of a matrix. 

Assumption [T] is a quite loose condition and commonly used in the convergence analyses of similar 
studies, see e.g., [33]. Under Assumption [TJ we have the following convergence results. 

Theorem 2.1. Under Assumption^ for any fixed f3 > and < r < 2/A max (A T A) ; where A max (A T A) 
denotes the spectral radius of A T A, the sequence {(w k ,u k )} generated by Algorithm^ from any starting point 
(w°,u°) converges to a solution (w*,u*) of (jl.6p . 

Theorem 2.2. Suppose the sequence {(w k ,u k )} generated by Algorithm^ converges to (w*,u*). Then, 
we have = w* = ; V i G L after a finite number of iterations, where L = {i : ||D^*|| < 

Theorem 2.3. Under the conditions of Theorem \2.1\ the sequence {u k } generated by Algorithm^ 
converges to {u*} q-linearly. 

The proofs of Theorems 12. 1[ 12.21 and 12.31 are given in Appendix [A] 

2.2. An accelerated scheme based on inexact alternating direction method. It is well-known 
that problem (|1.6|) well approximates (jl.3p only when f3 is sufficiently large. However, it is generally difficult 
to determine theoretically how large a j3 value must be to attain a given accuracy. In this section, we present 
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an inexact alternating direction method (ADM), which converges to a solution of (|1.3p without requiring /3 
goes to infinity. 

First, we review briefly the idea of ADM pioneered in [T7J [18]. The classical ADM is designed to solve 
the following structure optimization problem: 

min{0i(2/) + 2 {z) : Hy - z = 0} , (2.7) 

y,z 

where 6\ : W —> R and 2 : M £ — » M are functions, and H is a s x t matrix. Given z k G M £ and p fe G IR S , the 
ADM iterates as follows 

y k+1 <- argmin^(y) - (p fc ) T (ffy - z k ) + Uh V - z k \\ 2 , (2.8) 
y 2 

z k+1 <r- argmin<9 2 (z) - (p k ) T (Hy k+1 - z) + ^\\Hy k+1 - z\\ 2 , (2.9) 
z 2 

p k+1 ^p k - (j(Hy k+l - z k+1 ), (2.10) 

where a > is a parameter. In (|2.8)h p fc is the Lagrangian multiplier and <j severs as a penalty parameter. 
It can be shown that, under quite reasonable assumption, (|2.8|) converges to a solution of (|2.7p for any fixed 

a > 0, see na hhj. 

We now consider the model (|1.3|) in its equivalent form 

min { V. || Wi || + - /|| 2 : w, = A«, Vi j . (2.11) 
The augmented Lagrangian problem of (|2. 1 lj) is given by 

min£. f || Wi || - \J(wi - D lU ) + § || Wj - A«|| 2 ) + f \\Au - f\\ 2 , (2.12) 



where, for each z, A$ G M 2 is the Lagrangian multiplier attached to w$ = DiU. Inspired by the ADM 
iterations, for given (u k , w k , A fc ), we obtain the next triplet w k+1 , A fe+1 ) as follows. First, for fixed u k 

and A fc , the minimization of (j2.12p with respect to w is equivalent to 

min ||w,|| + £|| Wj - {D lU k - X k //3)|| 2 , i = 1, 2, . . . , n 2 , 
the solutions of which are given by 

w* +1 = maxjllAtt* - >Hn - l.o} g^ffj, . (2.13) 

Second, for fixed u fe and A fe , the minimization of (|2.12p with respect to u is approximated by linearizing 

\\\Au — /|| 2 and adding a proximal term as in (j2.4j) . resulting the following problem 

m M in E, (-( A ") T K fe+1 - A«) + ^||w* +1 - A«|| 2 ) + £||u - (u k - rg k )f, (2.14) 

where is defined in (|2.4j) . It is easy to show that the normal equations of (|2.14|) are of the form 

(d t D + iLj) u = D T (w k+1 - \ k /f3) + - rg k ). (2.15) 

Under the periodic boundary conditions, the exact solution of (|2.15|) can be attained by two FFTs. Finally, 
A is updated via 

A fc+i = X k _ p( w k+i _ Du k+i^ (2.I6) 
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We note that the linearization technique makes the i^-subproblem of (|2.12p is solved inexactly. Therefore, 
we name the above iterative framework as an inexact ADM or I ADM, which is summarized below. 
Algorithm 2 (IADM). Input f , A and /i,/3,r > 0. Initialize u° = f and k = 0. 
While "not converged" , Do 

1) Compute w k+l according to \2.13\) for fixed X = X k and u = u k . 

2) Compute u k+1 according to $2.15]) for fixed X = X k and w = w k+1 . 

3) Update X via (|2.16p and set k = k + 1. 
End Do 

We have the following convergence results for Algorithm [2j 

Theorem 2.4. Under Assumption^ the sequence {(w k , u k )} generated by Algorithm^ from any starting 
point (w°,u°) converges to a solution of (|2.1ip . 

A closer examination shows that Algorithm [2] is related to the proximal ADM of He et al. [20] for solving 
monotone variational inequalities. Hence, the global convergence is followed directly, see Appendix [B] for 
details. 

3. Numerical experiments. In this section, we present numerical results to illustrate the feasibility 
and efficiency of FTVCS and its accelerated variant I AMD. All experiments were accomplished in Mat- 
lab 2009a running on a PC (Intel Pentium(R) 4, 1.6 GHz, 1.0GB SDRAM) with Windows XP operating 
system. As usual, we measure the quality of reconstruction by relative error to the original image u, i.e., 

RE = ^ffi x 100%. 

INI 

In the following, we first present primary experimental results to show the feasibility of both algorithms, and 
then compare both algorithms with TwIST — a state-of-the-art algorithm for solving (|1.3|) . to demonstrate 
their efficiency. 

3.1. Test on FTVCS and IADM. In the first experiment, we present reconstruction results of 
both algorithms to illustrate their feasibility for solving (jl.3|) . We used a random matrix with independent 
identical distributed Gaussian entries as CS encoding matrix and tested the Shepp-Logan phantom image, 
which has been widely used in simulations for TV models. Due to storage limitations, we tested the image 
size 64 x 64. The sample ratio in this test is 30%, which are selected uniformly at random. Besides, 
we added Gaussian noise of zero mean and standard deviation a = 0.001. Similar as in FTVd [33], we 
implemented FTVCS with a continuation scheme on j3 to speed up convergence. Specifically, we tested the 
/3-sequence {2 4 , 2 5 , 2 6 , 2 7 } and used the warm-start technique. In IADM, the value of f3 is fixed to be 8. In 
both algorithms, the weighting parameter \i was set to be 200. Both algorithms were terminated when the 
relative change between successive iterates fell below 10 -3 , i.e., 

< io- 3 |K_i||. (3.1) 

The original image, the initial guess, and the reconstructed ones by both algorithms are listed in Figure I3TT1 
As is shown in Figure 13. 1[ both algorithms perform favorably and produce faithful recovery results in a 
few seconds. We note that the per-iteration cost of both algorithms is one shrinkage operation, two matrix- 
vector multiplications and two FFTs. The results also indicate that the inexact ADM approach described in 
Algorithm [2] is indeed more efficient than the penalty approach FTVCS described in Algorithm [T] in the sense 
that better recovery results were obtained in less CUP seconds. To closely examine the convergence behavior 
of both algorithms, we present in Figure [32] the decreasing of objective function values and relative errors 
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Original 



FTVCS: 13.31%, 18.25s 



IADM:5.68%, 12.53s 
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\ ; \ ; 
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Fig. 3.1. Reconstruction results of FTVCS and IAMD. Original (left, 64 x Initial guess (middle left); Recovered by 
FTVCS (middle right, RE=13.31%, CPU time 18.25s); Recovered by IAMD (right, RE=5.68%, CPU time 12.53s) 

as CPU time proceeded. It is clear from Figure [321 that both algorithms generated decreasing sequences of 
function values. From the right-hand plot, I ADM achieved a solution of lower relative error. In both plots, 
the curves of I ADM fall bellow those of FTVCS throughout the whole iteration process. 




5 10 15 20 5 10 15 20 

CPU time (sec) CPU time (sec) 



Fig. 3.2. Convergence behavior of FTVCS and I ADM. Left: objective function; Right: relative error. In both plots, the 
horizontal axes denote CPU time in seconds. 



3.2. Comparison with TwIST. In this subsection, we present extensive numerical results to compare 
I ADM with TwIST [3] — a two-step iterative shringkage/thresholding algorithm for solving a class of opti- 
mization problems arising from image restoration, reconstruction and linear inverse problem^]. Specifically, 
TwIST is designed to solve 

min J Xu) + ^\\Au-b\\ 2 , (3.2) 

where J{-) is a general regularizer, which can be either the ^i-norm or the TV semi- norm, as well as others. 
In the comparison, we used partial discrete cosine transform (DCT) matrix as CS encoder, i.e., the m rows 
of A were chosen uniformly at random from the nxn DCT matrix. Since the DCT matrix is implicity stored 
as fast transforms, this enables us to test larger images. We used the default parametric settings for TwIST 
and terminated it as the relative change in objective function values fell below tol = 10 -3 . The parameters 
in IADM were set as follows: r = 1.9 and f3 = 2 6 . To obtain higher quality images, we used more stringent 
stopping tolerance and terminated IADM when \\uk — Uk-i\\ < 5 x 10 _5 || 2x^-1 1| was satisfied. 

We first compared IADM with TwIST using the Shepp-Logan phantom benchmark image of size 128 x 
128. We randomly selected 30% DCT coefficients and added Gaussian noise of mean zero and standard 



1 The Matlab code of TwIST can be obtained from http://www.lx.it.pt/~bioucas/TwIST/TwIsfThtm] 
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Table 3.1 

Comparison results of IADM and TwIST with different ji. 





TwIST 


IADM 


M 


RE 


Obj 


Iter 


Time 


RE 


Obj 


Iter 


Time 


100 


3.79% 


736.10 


40 


26 


51 


4.80% 


721.95 


140 


12.73 


500 


3.74% 


804.20 


55 


29 


23 


3.37% 


786.99 


219 


19.26 


600 


3.93% 


810.80 


56 


29 


02 


3.43% 


794.20 


247 


21.47 


700 


3.97% 


814.32 


62 


33 


27 


3.56% 


798.19 


271 


23.34 


800 


3.90% 


815.41 


69 


33 


17 


3.59% 


800.68 


297 


23.16 


900 


4.06% 


818.17 


65 


31 


92 


3.51% 


803.09 


310 


26.97 


1000 


4.71% 


817.52 


71 


29 


72 


3.81% 


803.00 


354 


25.86 


2000 


4.45% 


831.86 


91 


43 


41 


3.99% 


817.08 


602 


53.22 


3000 


5.40% 


926.04 


86 


34 


55 


4.54% 


816.49 


845 


75.22 


4000 


4.40% 


830.97 


122 


47 


23 


4.23% 


817.81 


1066 


84.11 


5000 


4.54% 


873.51 


146 


61 


06 


4.23% 


821.52 


1303 


116.03 


6000 


4.51% 


858.31 


182 


77 


19 


4.33% 


822.30 


1581 


132.27 


7000 


4.61% 


852.42 


200 


89 


05 


4.39% 


821.55 


1742 


157.08 


8000 


4.10% 


832.80 


288 


139 


17 


4.42% 


821.31 


1929 


177.05 


9000 


4.22% 


830.45 


315 


135 


16 


4.59% 


819.54 


2165 


170.30 


10000 


4.10% 


828.36 


336 


156 


98 


4.47% 


817.46 


2405 


245.77 



deviation 0.001. Table 13^21 reports the detailed results of both algorithms for different values of /i, where RE, 
Obj, Iter and Time represent, respectively, the relative error of the reconstructed image to the original one, 
the final objective function value, the number of iterations, and the consumed CPU time in seconds. 

It can be seen from Table [32] that, for both algorithms, the number of iterations becomes larger and 
larger as \i increases, and as a result longer CPU time is consumed. For larger /i, the performance of both 
algorithms deteriorates, while the resulting relative errors were not improved. For \i between 500 and 7000, 
IADM always obtained comparable or higher recovery quality than TwIST. For \i between 500 and 1000, 
IADM is also faster than TwIST. In terms of final function values, IADM obtained slightly smaller ones than 
those of TwIST. 

Besides the Shepp-Logan phantom image, we also tested Cameraman, Lena, Boat, Sailboat, as well as 
two brain images. In this experiment, we simply set \i = 500 and keep all other parameters unchanged. The 
original and the recovered images by TwIST and IADM are given in Figures [331 and [3^4| and detailed results 
including relative errors (RE), CPU time (Time), final objective function values (Obj), and the number of 
iterations (Iter) are presented in Table 13.21 It can be seen from Table 13.21 that IADM attained comparable 
or better image quality in less CPU seconds. For each test, IADM consumed more iterations while the 
CPU time is less because the per-iteration cost of IADM is much less than that of TwIST. Specifically, the 
per-iteration cost of IADM contains two matrix- vector multiplications and two FFTs, while TwIST needs to 
solve a TV denoising problem at each iteration. In addition, IADM always attained smaller function values. 
In summary, the comparison results indicate that IAMD performs favorably and can be competitive with 
the state-of-the-art algorithm TwIST. 

4. Concluding Remarks. In this paper, we proposed a Fast alternating minimization algorithm for 
Total Variation image reconstruction from Compressive Sensing data (FTVCS). The per-iteration cost of 
FTVCS includes a linear time shrinkage operation, two matrix-vector multiplications and two FFTs. To 
overcome the difficulty caused by large penalty parameter in FTVCS, we have also developed an Inexact 
Alternating Direction Method (IADM) based on linearization and proximal techniques. Our experimental 
results indicate that IADM indeed performs better than FTVCS and is comparable with the state-of-the-art 
algorithm TwIST for solving TV reconstruction models. 
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Fig. 3.3. Original and recovered images by IADM and TwIST. From top to bottom: brain 1, brain 2, cameraman, lena 
and man. 
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Table 3.2 

Comparison results of IADM and TwIST with different images. 





TwIST 


IADM 


Images 


Size 


Iter 


RE 


Time 


Obj 


Iter 


Re 


Time 


Obj 


brain 1 


128 X 128 


52 


14.01% 


34.25s 


4.7831e+002 


208 


13.64% 


20.41s 


4.5478e+002 


brain 2 


256 X 256 


48 


9.59% 


90.22s 


1.6397e+003 


176 


9.45% 


61.67s 


1.5665e+003 


cameraman 


256 x 256 


56 


5.71% 


122.22s 


2.9068e+003 


257 


5.59% 


118.67s 


2.7822e+003 


lena 


256 x 256 


53 


4.93% 


121.06s 


2.3656e+003 


205 


5.01% 


92.11s 


2.2627e+003 


man 


512 x 512 


59 


8.54% 


423.38s 


1.0617e+004 


262 


8.57% 


400.81s 


1.0122e+004 


sailboat 


450 x 450 


57 


4.91% 


361.58s 


7.9220e+003 


245 


4.98% 


260.73s 


7.5960e+003 


sheppon 


512 x 512 


42 


2.62% 


335.36s 


4.4496e+003 


135 


2.09% 


217.08s 


4.2317e+003 


boat 


512 x 512 


53 


4.37% 


477.17s 


8.7306e+003 


200 


4.34% 


384.61s 


8.3312e+003 


barbara 


512 x 512 


56 


9.83% 


493.20s 


1.3469e+004 


292 


9.80% 


550.39s 


1.2814e+004 



Original: 450*450 TwIST IADM 




Fig. 3.4. Original and recovered images by IADM and TwIST. From top to bottom: sailboat, sheppon and boat. 

Given the promising performance of IADM and the wide applications of TV models, we believe that it 
is worthwhile to further accelerate IADM via certain line search strategy. In both FTVCS and IADM, we 
used a linearization technique and FFTs to obtain a new point. A possible improvement is to solve the u- 
subproblem of (j2.12|) by using certain gradient methods, e.g., gradient descent method with BB steplengths 
PQ and non-monotone line search. This should be interesting for further investigations. 
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Appendix A. Proof of Theorems EUl E2] and 1231 

The purpose of this appendix is to establish convergence properties of Algorithm Q] for a fixed /3 > 0. 
For convenience, we define some notation. For fixed f) > 0, the 2-dimensional (2D) shrinkage operator 
s : R 2 -> R 2 is defined as 

CM. 

3(a) = a-P B (a) = max {||a|| -1/0,0}^, (A.l) 

\\ a \\ 

where Vb(-) • ^ 2 —> ^ 2 is the projection onto the closed disc B = {a G R 2 : ||a|| < and the convention 

• (0/0) = is followed. For vectors u, v G R N , N > 1, we define S(u; v) : R 2N R 2N by 

S(u; v) = (s(cki); . . . ; s(a/vO), where oti = 

i.e., S applies 2D shrinkage to each pair (u^Vi), for i = 1, 2, . . . , N. From the definition of s(-), it is easy to 
see that (|2.ip or (|2.2|) can be rewritten as Wj = s(Diu). The following result shows that the operator s is 
non-expansive. 

Lemma A.l ([33 ). For any a,b eR 2 , it holds that 

\\s(a)-s(b)\\ 2 <h-b\\ 2 -\\V B (a)-V B (b)\\ 2 , 

Furthermore, if \\s(a) — s(b)\\ = \\a — b\\, then s(a) — s(b) = a — b. 

Since the objective function in ([1.6)) is convex, bounded below, and coercive (i.e., its goes to infinity 
as ||(^,^)|| —> oo), it has at least one minimizer (w*,u*) that cannot be decreased by the alternating 
minimization scheme (|2.2p ~ (|2.3p and thus must satisfy 

w* = S(DWu*i D^u*) (= S(Du*)), 

(D T D + f A T A)u* = D T w* + jjA T f. ^ ' ' 
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(A.3) 



By using the shrinkage operator, we can rewrite the iteration of Algorithm [T] as 

( w k+i = S{D^u k ;D^u k ) S(Du k )), 
{ (D T D + f ? I)u k + 1 = D T w k + 1 + fi(u k - rg k ). 

In the following, we show that (|A.3|) converges to (|A.2|) . The following matrices will be used in our 
analysis: 



M = D D 



^-A T A, H = D T D + -^-I, and T = I — rA T A. 



Assumption [T] ensures the non-singularity of M, while H -1 is always well defined under the circumstance. 
Simple manipulation shows that H — M = ?? 2 T, where r] = \Jj^f- With these definitions, ()A.2|) and (|A.3|) 
can be, respectively, simplified as 



= S(Du*), 
v* = r]Tu* +rjTA T /, 



and 



w 



k+1 = S(Du k ), 



„fc+i - 



r]Tu k + t]tA t f, 



Hu k+1 = D T w k+1 



■ rjv 



fe+i 



To further simplify the above equations, we define 

T / \ 



v 



and u) = r]TH 1 



77/ 



Hence, the solution and iteration systems can be, respectively, rewritten as 

w* =Soh(w*;v*), 
v* = p(w*;v*) + r]rA T /, 
i2V = D T w* 



and 



w k+i = s o h(w k ; v k ), 



p(w k ;v k ) + t]tA t f. 



Hu k+1 = Z} T K; fc+1 + r/^ 
where "o" denotes operator composition. Furthermore, we define 



fc+i 



5 o u) 







Then (IA.4D and (IA.5D become 



(w*; v*) = q(w*;v*) 
Hu* = L> T w* +7/v*, 



and 



(i^ /c+1 ; v /c+1 ) = q(w k ;v k ) 
Hu k+1 = D T w k+1 + 



Lemma A. 2. q(w;v) is non- expansive. 

Proof. Given (iu 1 ; v 1 ) and (k; 2 ; v 2 ), it holds that 



\\q(w ;v )-q(w \v 



w 

V 



Sohiw 1 -^ 1 ) -Soh(w 2 ;v 2 ) 
p(w 1 ; v 1 ) — p(w 2 ; v 2 ) 

< \\hiwW) - h(w 2 ;v 2 )f + Mw'^-piw 2 ^ 2 



(A.4) 



(A.5) 



R 
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where "<" comes from the non-expansive of s(-) and 



D 



It is easy to verify that 



D 
rjl 



R T R 



D 
rjl 

D 
rjl 

D 
rjl 



H~ 1 (D T D^r] 2 T 2 )H 



2rji2\ TT — 1 



D 
rjl 



^-^(2A T A-T(A T A) 2 )^jH- 1 l ^ 



D 
rjl 



a I D 

P I 7)1 



ff-if " \ " \H-\2A Y A-r(A Y A) 2 )H- 1 



D 
rjl 



Recall that we require < r < 2/A max (74 T A), which ensures the positive semi-definiteness of A T A— r(A T A) 2 . 
Therefore, 



1 2 
W — W 



v 1 — v 2 



D 
rjl 



H- 1 



which shows that q(w; v) is non-expansive. □ 

Lemma A. 3. Equality holds in iA.6\) if and only if 



D 
rjl 



W — W 



v 1 — v 2 



< 



v 1 — v 2 



, (A.6) 



q(w ;v ) - q(w ;v ) 



1 2 

v — V 



Proof We note that in the proof of Lemma IA.2I there exist three "<" . Thus, equality holds in (|A.6|) 
only when all the three inequalities become "=". For simplicity, we let dw = w 1 — w 2 and dv = v 1 — v 2 . 
1. The first "<" becomes "=" if and only if 



5 o ft^ 1 ; v 1 ) -So h(w 2 ;v 2 ) = hiw 1 -^ 1 ) - h(w 2 ;v 2 ) = DH' 1 



D 
rjl 



dw 
dv 



2. The second "<" becomes "=" if and only if 

(:) T (:)™-^-'(:) T (t 

3. Let U be orthonormal and 



= 0. 



D 
rjl 



H' 1 



D 
r/I 



U l AU 



be its eigenvalue decomposition. The third "<" becomes "=" if and only if 

2 / / \ \ 2 



U 



dw 
dv 
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Since < < 1, the above equality holds only when 



Therefore, 



AU 



( dw 








\ dv 





and thus 



From 1 and (IA.7I), we have 



D ) H' 1 ( ° ) dW = dW ). (A.7) 
i]I \ rjl J \ dv I \ dv I 



Soh{w 1 ;v 1 )-Soh{w 2 -v 2 ) = DH- 1 y D i j y d ™ j = dw. (A.8) 

Prom (|A.7|I . the equality in 2 is equivalent to 

dv T (2A T A - r(A T A) 2 )dv = 0. 
Let U J AU = A T A be the eigenvalue decomposition of A T A. The above equation is equivalent to 
dv T U T (2A-rA 2 )Udv = or ^(2A; - T \ 2 ){Udv) 2 = 0. 

Since 2A^ — rA| > 0, we have (2A^ — r\f)(Udv)^ = 0,V i. If A^ 7^ 0, then from the choice of r we have 
2\i - rAf > 0, and thus (Udv)i = 0. Therefore, AUdv = and A T Adv = U T AUdv = 0. Sum the above 
discussions up, we have 



q(w L ; v l ) - q(w 2 ]V 2 ) 



( Soh(w x -v x )-Soh(w 2 -v 2 ) \ 
p(w 1 ; v 1 ) — p(w 2 ; v 2 ) J 

(dw \ I dw \ j w 1 — w 2 \ 

T -dv J ~ \ dv - A T Adv )~ \ v 1 - v 2 J ' 

where the first equality is from the definition of #(•; •); the second one is from (|A.8|h the definition of p and 
(|A.7|) ; the third one is from the definition of T; and the final one is from A T dv = 0. This completes the 
proof. □ 

Corollary A. 4. Suppose is a fixed point of q, i.e., = q(w*;v*). Then for any (w;v) 

it holds 

\\q(w;v)-q(w*;v*)\\<\\(w,v)-(w*;v*)\\ 

unless (w;v) is also a fixed point ofq(-;-). 

Based on the above lemmas, now we are ready to give the proofs of Theorems 12. 1[ 12.21 and 12.31 

Proof (Theorem 12. 2|) First, the convergence of {w k ,u k ) to (w*,u*) can be established using exactly the 

same arguments as in Theorem 3.4 in [33]. The convergence of u k to u* follows from the convergence of w k 

to w* and v k to v*. Therefore, we omit the details. □ 
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3n _^ ^ ^ _ j-^ ,n 2 }/L,, where we recall that 



L = {i : HA^II = \\hi(w*,v*)\\ < 1//3}, and 

w = min{l//3 - || A^*|| : i e L} > 0. 
Proof. (Theorem 12. 2p From the non-expansive of s(-), for each z, it holds 

l|w* +1 -w*|| = Hso^^^^-so^^*;^)!! < 11^^;^) -kiK;*' 



(A.9) 



(A.10) 



Suppose that at iteration k there exist at least one index i <E L such that vf k+1 = so hi(w k ;v k ) ^ 0. Then 
\\hi(w*;v*)\\ < 1/13, \\hi(w k ;v k )\\ > 1/(3, and w* = s o hi(w*; v*) =0. Therefore, 



(A.ll) 



l|w? +1 - w?|| = ||*o t,*)f = (\Mw k ;v k )\\ - V/?) 2 

< [\\hi(w k ;v k ) -hi(w*;v*)\\ - (l/p-\\h t (w*;v*)\\] 2 

< WWvt-y) ~ hi(w*;v*)\\ 2 - (l/p-\\h t (w*;v*)\\) 2 

< \\h i (w h -,v k )-h i (w'-,v')\\ 2 -w 2 , 

where the first "<" is the triangular inequality, the second one follows from the fact that \\hi{w k ] v k ) — 
hi(w*;v*)\\ > 1/(3 — \\hi(w* ] v*)\\ > 0, and the last one used the definition of uo in (|A.9|) . Combining with 
(1A.9|) and (IA.11|) . we obtain 



w k+l - w* 



= EK fe+1 -<n 2 + ii- fe+1 - v *ii 2 

< 11^(^5 «*) - hi(w*;v*)\\ 2 -uj 2 + \\v k+1 - v*\\ 2 

i 

= IMw*^*) - hi(w*;v*)\\ 2 -u 2 + \\p(w k ; v k ) - p{w*-X) 

2 



w k 


-w* 




-V* 



where the second "<" comes from the non-expansiveness of 

h(w; v) 



<t>(w;v) 



p(w;v) 



which can be easily derived. Therefore, the number of iterations k with + ^ does not exceed 

2 



1 


W° 


-w* 


^2 


V° 


-v* 



This completes the proof of Theorem 12.21 □ 

Proof. (Theorem 12. 3|) From the iteration formulae for u and (w;v), there holds 

T 



,k+l 



D 
rjl 



and 



w k+l - w* 
v k+i _ y * 



= \\q(w k ;v k )-q(w*;v 



* . * \ 1 1 2 



< 



w k+l - w* 

V k+l _ y* 



l k - 



u*) 


2 








u*) 
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Considering the finite convergence of w^, i G I, we have 

2 



w k E +l -wl 
v k+1 — V* 



<p((i?'i?)BB) 







w% 


-w* E 




-v* 



p3n z x3n z 



where (R t R)ee is a sub- matrix of R T R G 

UieL{M + n<2 }) an d corresponding columns. Multiplying (D; rjT) to the recursion of u k+l — u* , we get 



formed by throwing away certain rows (with indexes 



f fc+i 



T D-\-rj 2 T 2 



w k+l _ w * 

yk + 1 _ y* 



p((R t R)ee) 



R T R 



w k+l _ w * 

yk+1 _ y* 



yk+l _ y* 



< P((R T R) EE )\\u k - u*\\ 2 DTD+v2T2 , 



which shows that {u k } converges q-linearly. □ 

Appendix B. Convergence of Algorithm [21 In this section, we clarify the relationship between 
Algorithm [2] and the proximal ADM approach proposed in [20] . The convergence of Algorithm [2] follows 
directly. 

We briefly review the proximal ADM approach in [20] for structured variational inequality (SVI) prob- 
lems. Let M and N be, respectively, I x n and I x m matrixes, X C W 1 and y C lR m be nonempty closed 
convex sets, and /, g be given monotone operators. The SVI problem is to find u* G ft such that 



(u-u*) T F(u*) > 0, V u G ft, 
where ft = {(x, y) : x e X ,y e y, Mx + Ny = 0}, 



(B.l) 



u = 



x 

y 



, and F(u) = 



g(y) 



Given (x k ,y k ,\ k ), the proximal ADM proposed in [20] iterates as follows 

1. Compute x k+1 G X via solving 

(x f - x) T {f(x) - M T [X k - h(Mx + TV^)] + R k (x - x k )} > 0, V x' G A\ (B.2) 

2. Compute G 3^ via solving 

is/ - y) T {g(y) - N T [\ k - h(Mx k+1 + N y )\ + - /)} > o, Vj/'ey. (b.3) 

3. Update \ k+1 via 

A fc+i = X k _ h ( Mx k+i + Ny k+1 ), 

where h > is a parameter, and 5& are symmetric positive semidefinite matrices. Under mild assump- 
tions, global convergence of this proximal ADM approach was established in [20]. Simple manipulation shows 
that Algorithm [2] is a special case of the proximal ADM approach described above by setting R^ = in 
(|B.2|) . i.e., the the lu-subproblems are solved exactly in Algorithm [2j and Sk = — A T A in (|B.3|) . Hence, 
the global convergence of Algorithm [2] to a solution of ([2. 11)) follows from [20] Theorem 4]. 



